Superstore Sales & Profit Analysis

The data is about a USA & Canada retail chain. Retail chains sell goods or services to customers through multiple channels of distribution to earn a profit. Retailers meet the demand identified through a supply chain. This industry operates on the sales of typical product lines of grocery items and merchandise products, such as food, pharmaceuticals, apparel, games and toys, hobby items, furniture and appliances. The analysis of such industry is of great importance as it provides insights on the sales and profits of various products. Our analysis is based on this retail chain bases in USA & Canada (PCA, Kmeans).

Columns:

Order Date: Date on which a customer places the order (manually separated into month, day, and year).

Ship Date: Date on which the order is shipped.

Ship Mode: Mode of shipment of each order.

Customer ID: ID assigned to each customer who places an order.

Customer Name: Name of Customer.

Segment: Section from where the order is placed.

City: US cities.

State: US states.

Postal Code: A serial number for sorting mail purposes.

Region: USA regions.

Product ID: Product ID of each product.

Category: Category to which each product belongs to.

Sub-Category: Sub Category of each Category.

Product Name: Name of products.

Sales: Selling price of each product.

Quantity: Number of quantity ordered for a particular product.

Discount: Discount applied on each product.

Profit: Profit gained on each product.

Dataset Sample

Product Analysis

We observe that products costing roughly no more than 1500 USD accounted for the majority of sales.

We see that the Technology category had the biggest sales revenue.

We see that the Phone subcategory under Technology came with the biggest sales revenue. Most profiting subcategory is Copier from Technology.

We observe that the among the most frequented word in product names is Xerox (US Company that sells print and digital document products), which means that print and digital document products are very frequently bought.

We observe that three products maintained their place among the top four best sellings during the 4 years. Those are:

Correlation Matrix

The correlation matrix shows correlation between customer spent money on one product and other products in the US. We observe that there is no significant correlation between most of the products. We explore Canada next.

The correlation matrix shows correlation between customer spent money on one product and other products in Canada. We observe that there is a significant correlation between most of the products.

PCA Scree Plot

From the scree plot we can see that, PC1 almost accounts for 36% of the variations and PC2 accounts for almost 7%.

PCA Individuals & Variables Plots

Variables (Loading) Plot

Individual Plot

Variables Contribution to PC1 & PC2

Customer Analysis

K-Means Clustering

We can observe that the optimal number of customers clusters is two following the elbow method.

Based on PCA individuals grouping, and how one group of individuals tended to spend less. We wanted to see if that less spending behaviour is related to age, and we graphed age vs paper, as paper contributed the most to PC1.

We see that West region is the most profiting region, with a profit base coming from Consumers through Technology.

10 as the number of purchases was very common among customers.

We observe that the quantities 2 and 3 we very frequent among customers, especially consumers.

Time Series Analysis

Calendar Maps

We see that Saturday/17 December 2016, and Thursday/23 March 2017 made the highest number of sales. White cells indicate that no sale acitvity had taken place.

We see that the total sales remain relatively the same during the years 2014, and 2015, but with a significant jump in the years that follow, and that s for all categories except for Furniture.

We see that the ship mode First Class was the one associated with highest profits consistently throughout the years.

This animated chart allows for a closer observation on the ranking of states with regard to their sales revenue across the days from 2014 till 2017.

Location Analysis

We see that there are four groups of states.

We see that the State of California had the biggest total sales revenue, followed by New York, Texas, and Washignton, respectively.

We see that there are some cities that had the most of sales revenue, such as Seattle, LA, San Francesco and Philadelphia.

We see that numerous cities had a large volume of orders, yet, cities like San Francisco, Los Angeles, Seatle and Philadelphia has the largest volume of orders.

We see that the most dominant mode of shipping across the USA is Standard class, regardless of the volume of orders.

We observe that the dominant least bought category is the Technology one across all states, however, the state of North Dakota saw the category of office supplies as the least bought category.